Methods for Identifying Driver Pathways in Cancer

نویسنده

  • Mark DM LEISERSON
چکیده

Distinguishing the somatic mutations responsible for cancer (driver mutations) from random, passenger mutations is a key challenge in cancer genomics. Driver mutations generally target cellular signaling and regulatory pathways consisting of multiple genes. This heterogeneity complicates the identification of driver mutations by their recurrence across samples, as different combinations of mutations in driver pathways are observed in different samples. We introduce the Multi-Dendrix algorithm for the simultaneous identification of multiple driver pathways de novo in somatic mutation data from a cohort of cancer samples. The algorithm relies on two combinatorial properties of mutations in a driver pathway: high coverage and mutual exclusivity. We derive an integer linear program that finds set of mutations exhibiting these properties. We apply MultiDendrix to somatic mutations from glioblastoma, breast cancer, and lung cancer samples. Multi-Dendrix identifies sets of mutations in genes that overlap with known pathways – including Rb, p53, PI(3)K, and cell cycle pathways – and also novel sets of mutually exclusive mutations, including mutations in several transcription factors or other genes involved in transcriptional regulation. These sets are discovered directly from mutation data with no prior knowledge of pathways or gene interactions. We show that Multi-Dendrix outperforms other algorithms for identifying combinations of mutations and is also orders of magnitude faster on genome-scale data. Software available at: http://compbio.cs.brown.edu/software. Author Summary Cancer is a disease driven largely by the accumulation of somatic mutations during the lifetime of an individual. The declining costs of genome sequencing now permit the measurement of somatic mutations in hundreds of cancer genomes. A key challenge is to distinguish driver mutations responsible for cancer from random passenger mutations. This challenge is compounded by the observation that different combinations of driver mutations are observed in different patients with the same cancer type. One reason for this heterogeneity is that driver mutations target signaling and regulatory pathways which have multiple points of failure. We introduce an algorithm, Multi-Dendrix, to find these pathways solely from patterns of mutual exclusivity between mutations across a cohort of patients. Unlike earlier approaches, we simultaneously find multiple pathways, an essential feature for analyzing cancer genomes where multiple pathways are typically perturbed. We apply our algorithm to mutation data from hundreds of glioblastoma, breast cancer, and lung adenocarcinoma patients. We identify sets of interacting genes that overlap known pathways, and gene sets containing subtype-specific mutations. These results show that multiple cancer ∗The work in this section has been accepted for publication by the journal PLoS Comp Bio. 3 pathways can be identified directly from patterns in mutation data, and provide an approach to analyze the ever-growing cancer mutation datasets. Introduction Cancer is a disease driven in part by somatic mutations that accumulate during the lifetime of an individual. The declining costs of genome sequencing now permit the measurement of these somatic mutations in large numbers of cancer genomes. Projects such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) are now undertaking this task in hundreds of samples from dozens of cancer types. A key challenge in interpreting these data is to distinguish the functional driver mutations important for cancer development from random passenger mutations that have no consequence for cancer. The ultimate determinant of whether a mutation is a driver or a passenger is to test its biological function. However, because the ability to detect somatic mutations currently far exceeds the ability to validate experimentally their function, computational approaches that predict driver mutations are an urgent priority. One approach is to directly predict the functional impact of somatic mutations using additional biological knowledge from evolutionary conservation, protein structure, etc. and a number of methods implementing this approach have been introduced (see [1–4]). These methods are successful in predicting the impact of some mutations, but generally do not integrate information across different types of mutations (single nucleotide, indels, larger copy number aberrations, etc.); moreover, these methods are less successful for less conserved/studied proteins. Given the declining costs of DNA sequencing, a standard approach to distinguish driver from passenger mutations is to identify recurrent mutations, whose observed frequency in a large cohort of cancer patients is much higher than expected [5, 6]. Nearly all cancer genome sequencing papers, including those from TCGA [7–10] and other projects [5, 11, 12], report a list of significantly mutated genes. However, driver mutations vary greatly between cancer patients – even those with the same (sub)type of cancer – and this heterogeneity significantly reduces the statistical power to detect driver mutations by tests of recurrence. One of the main biological explanations for this mutational heterogeneity is that driver mutations target not only individual genomic loci (e.g. nucleotides or genes), but also target groups of genes in cellular signaling and regulatory pathways. Consequently, different cancer patients may harbor mutations in different members of a pathway important for cancer development. Thus, in addition to testing individual loci, or genes, for recurrent mutation in a cohort of patients, researchers also test whether groups of genes are recurrently mutated. Since exhaustive testing of all groups of genes is not possible without prohibitively large sample sizes (due to the necessary multiple hypothesis testing correction), current approaches focus on groups of genes defined by prior biological knowledge, such as known pathways (e.g. from KEGG [13]) or functional groups (e.g. from GO [14]), and methods have been introduced to look for enrichment in such pre-defined groups of genes (e.g. [15–17]). More recently, methods that identify recurrently mutated subnetworks in protein-protein interaction networks have also been developed, such as NetBox [18], MeMO [19], HotNet [20], and EnrichNet [21]. Knowledge of gene and protein interactions in humans remain incomplete, and most existing pathway databases and interaction networks do not precisely represent the pathways and interactions that occur in a particular cancer cell. Thus, restricting attention to only those combinations of mutations recorded in these data sources may limit the possibility for novel biological discoveries. Thus algorithms that do not make this restriction – but also avoid the multiple hypothesis testing problems associated with exhaustive enumeration – are desirable. Recently, the RME [22] and De novo Driver Exclusivity (Dendrix) [23] algorithms were introduced to discover driver pathways using combinatorial constraints derived from biological knowledge of how driver mutations appear in pathways [24, 25]. In particular, each cancer patient contains a relatively small number of driver mutations, and these mutations perturb multiple cellular pathways. Thus, each driver 4 pathway will contain approximately one driver mutation per patient. This leads to a pattern of mutual exclusivity between mutations in different genes in the pathway. In addition, an important driver pathway should be mutated in many patients, or have high coverage by mutations. Thus, driver pathways correspond to sets of genes that are mutated in many patients, but whose mutations are mutually exclusive, or approximately so. We emphasize that the driver pathways exhibiting patterns of mutually exclusivity and high coverage are generally smaller and more focused than most pathways annotated in the literature and pathway databases. The latter typically contain many genes and perform multiple different functions; e.g. the “cell cycle” pathway in KEGG contains 143 genes. It is well known that co-occurring (i.e., not exclusive) mutations are observed in these larger, multifunctional biological pathways [25]. The RME and Dendrix algorithms use different approaches to find sets of genes with high coverage and mutual exclusivity: RME builds sets of genes from pairwise scores of exclusivity, while Dendrix computes a single score for the mutual exclusivity of a set of genes, and finds the highest scoring set. The aforementioned MeMO algorithm [19] also considers mutual exclusivity between mutations, but only for pairs of genes that have recorded interactions in a protein-protein interaction network. Thus, MeMO does not attempt to identify driver pathways de novo and can only define subnetworks in existing interaction networks. While many of the strongest signals of mutual exclusivity are between genes with known interactions, below we show examples in cancer data of mutual exclusive mutations between genes with no known direct iterations. The two existing de novo algorithms, RME and Dendrix, consider the detection of only a single driver pathway from the pattern of mutual exclusivity between mutations. However, it is well known that mutations in several pathways are generally required for cancer [26]. There is little reason to assume that mutations in different pathways will be mutually exclusive, and in contrast may exhibit significant patterns of co-occurrence across patients. Multiple pathways may be discovered using these algorithms by running the algorithm iteratively, removing the genes found in each previous iteration, and such an approach was employed for Dendrix [23]. However, such an iterative approach is not guaranteed to yield the optimal set of pathways. Here we extend the Dendrix algorithm in three ways. First, we formulate the problem of finding exclusive, or approximately exclusive, sets of genes with high coverage as an integer linear program (ILP). This formulation allows us to find optimal driver pathways of various sizes directly – in contrast to the greedy approximation and Markov Chain Monte Carlo algorithms employed in Dendrix. Second, we generalize the ILP to simultaneously find multiple driver pathways. Third, we augment the core algorithm with additional analyses including: examining gene sets for subtype-specific mutations, summarizing stability of results across different number and size of pathways, and imposing greater exclusivity of gene sets. We apply the new algorithm, called Multi-Dendrix, to four somatic mutation datasets: whole-exome and copy number array data in 261 glioblastoma (GBM) patients from The Cancer Genome Atlas (TCGA) [7], whole-exome and copy number array data in 507 breast cancer (BRCA) patients from TCGA [8], 601 sequenced genes in 84 patients with glioblastoma multiforme (GBM) from TCGA [7] and 623 sequenced genes in 188 patients with lung Adenocarcinoma [27]. In each dataset Multi-Dendrix finds biologically interesting groups of genes that are highly exclusive, and where each group is mutated in many patients. In all datasets these include groups of genes that are members of known pathways critical to cancer development including: Rb, p53, and RTK/RAS/PI(3)K signaling pathways in GBM and p53 and PI(3)K/AKT signaling in breast cancer. Multi-Dendrix successfully recovers these pathways solely from the pattern of mutual exclusivity and without any prior information about the interactions between these genes. Moreover, Multi-Dendrix also identifies mutations that are mutually exclusive with these well-known pathways, and potentially represent novel interactions or crosstalk between pathways. Notable examples include mutual exclusivity between: mutations in PI(3)K signaling pathway and amplification of PRDM2 (and PDPN) in glioblastoma; mutations in p53, GATA3 and cadherin genes in breast cancer. Finally, we compare Multi-Dendrix to an alternative approach of iteratively applying Dendrix [23] or RME [22], two other algorithms that search for mutually exclusive sets. We show that these iterative ap5 proaches typically fail to find an optimal set of pathways on simulated data, while Multi-Dendrix finds the correct pathways even in the presence of a large number of false positive mutations. On real cancer sequencing data, the groups of genes found by Multi-Dendrix include more genes with known biological interactions. Moreover, Multi-Dendrix is orders of magnitude faster than these other algorithms, allowing Multi-Dendrix to scale to the latest whole-exome datasets on hundreds of samples, which are largely beyond the capabilities of Dendrix and RME. Multi-Dendrix is a novel and practical approach to finding multiple groups of mutually exclusive mutations, and complements other approaches that predict combinations of driver mutations using biological knowledge of pathways, interaction networks, protein structure, or protein sequence conservation. Results Multi-Dendrix algorithm The Multi-Dendrix algorithm takes somatic mutation data from m cancer patients as input, and identifies multiple sets of mutations, where each set satisfies two properties: (1) the set has high coverage with many patients having a mutation in the set; (2) the set exhibits a pattern of mutual exclusivity where most patients have exactly one mutation in the set. We briefly describe the Multi-Dendrix algorithm here. Further details are provided in the Methods section below. We assume that somatic mutations have been measured in m cancer patients and that these mutations are divided into n different mutation classes. A mutation class is a grouping of different mutation types at a specific genomic locus. In the simplest case, a mutation class corresponds to a grouping of all types of mutations (single nucleotide variants, copy number aberrations, etc.) in a single gene. We represent the somatic mutation data as an m× n binary mutation matrix A, where the entry Aij is defined as follows: Aij = { 1 if gene j is mutated in patient i 0 otherwise. (1) More generally, a mutation class may be defined for an arbitrary genomic locus, and not just a gene, and may distinguish different types of mutations. For example, one may define a mutation class as single-nucleotide mutations in an individual residue in a protein sequence or in a protein domain. Or alternatively, one may separate different types of mutations in a gene (e.g. single-nucleotide mutations, deletions, or amplifications) by creating separate mutation classes for each mutation type in each gene. We will use this later definition of mutation classes in the results below. For ease of exposition we will assume for the remainder of this section that each mutation class is a gene. Vandin et al. [23] formulate the problem of finding a set of genes with high coverage and high exclusivity as the Maximum Weight Submatrix Problem. Here the weight W (M) = |Γ(M)| − ω(M) of a set M of genes is the difference between the coverage |Γ(M)|, the number of patients with a mutation in one of the genes in M , and the coverage overlap ω(M), the number of patients having a mutation in more than one gene in M . Vandin et al. [23] introduce the De novo Driver Exclusivity (Dendrix) algorithm [23] that finds a set M of k genes with maximum weight W (M). While finding single driver pathways is important, most cancer patients are expected to have driver mutations in multiple pathways. Dendrix used a greedy iterative approach to find multiple gene sets (described below), that is not guaranteed to find optimal gene sets. Identification of multiple driver pathways requires a criterion to evaluate possible collections of gene sets. Appealing to the same biological motivation as above, we expect that each pathway contains approximately one driver mutation. Moreover, since each driver pathway is important for cancer development, we also expect that most individuals contain a driver mutation in most driver pathways. Thus, we expect high exclusivity within the genes of each pathway and 6 high coverage of each pathway on its own. One measure that satisfies these criteria is to find a collection M = {M1,M2, ...,Mt} of gene sets whose sum of weights is maximized. We define the Multiple Maximum Weight Submatrices problem as the problem of finding such a maximum weight collection. We solve the Multiple Maximum Weight Submatrix problem using an integer linear program (ILP), and refer to the resulting algorithm as Multi-Dendrix (see Methods). In addition, the ILP formulation used in Multi-Dendrix uses a modified weight function Wα(M) = |Γ(M)| − αω(M), where α > 0 is a parameter that adjusts the tradeoff between finding sets with higher coverage Γ(M) (more patients with a mutation) versus higher coverage overlap ω(M) (greater non-exclusivity between mutations). We use this parameter in the breast cancer dataset below. In contrast, Dendrix was limited to α = 1. Simulated data We compare Multi-Dendrix to iterative versions of Dendrix [23] and RME [22] on simulated mutation data with both driver mutations implanted in pathways in a mutually exclusive manner and random passenger mutations. The goal of these simulations is to compare Multi-Dendrix to other algorithms that identify mutually exclusive genes on straightforward datasets that contain multiple mutually exclusive sets. We generate mutation data form = 160 patients and n = 360 genes as follows. We select a set of four pathways P = (P1, P2, P3, P4) with each Pi containing four genes. We select the coverage Γ(Pi) uniformly from the following intervals: [0.75m, 0.9m], [0.6m, 0.75m], [0.45m, 0.6m], [0.3m, 0.45m], respectively. The size of this dataset and the varying coverages of the pathways model what is observed in real data (see § Somatic Mutation data) and is consistent with models of mutation progression where driver mutations accumulate in pathways [28]. For each pathway Pi, we select |Γ(Pi)| patients at random and add a driver mutation to exactly one gene from the set Pi. Thus, the driver mutations in each pathway are mutually exclusive. We then add passenger mutations by randomly mutating genes in each patient with probability, q, the passenger mutation probability.We used values of q similar to our estimates for q on the TCGA GBM and Lung cancer data sets (in § Somatic Mutation data below), which were q = 0.001 and q = 0.0005, respectively. We emphasize that these simulations do not model all of the complexities of somatic mutations in cancer e.g. gene-specific and patient-specific mutation rates, genes present in multiple pathways, etc. Since the Dendrix and RME algorithms are designed to find single pathways, we compared MultiDendrix to iterative versions of these methods that return multiple gene sets. For Dendrix we used the iterative approach described in [23]: apply Dendrix to find a highest scoring gene set, remove those genes from the dataset, and apply Dendrix to the reduced dataset, repeating these steps until a desired number t of gene sets are found. We will refer to this algorithm as Iter-Dendrix. Thus, Iter-Dendrix returns a collection P = (P1, P2, . . . , Pt) of t gene sets such that W (P1) ≥ W (P2) ≥ · · · ≥ W (Pt). We implemented the analogous iterative version of RME, and will refer to this algorithm as Iter-RME. We compared the collection M of gene sets found by each algorithm to the planted pathways P, computing the symmetric difference d(P,M) between M and P as described in Methods. Table 1 shows a comparison of Multi-Dendrix, Iter-Dendrix, and Iter-RME on simulated mutation data for different values of q. Note that we do not show comparisons to Iter-RME for q ≥ 0.005 as Iter-RME did not complete after 24 hours of runtime for any of the 1000 simulated mutation data sets. While the RME publication [22] analyzed mutation matrices with thousands of genes and hundreds of patients, this analysis (and the released RME software) required that mutations were presented in at least 10% of the samples, greatly reducing the number of genes/samples input to the algorithm. In fact, a threshold of 10% will remove nearly all genes in current whole-exome studies (see § Comparison of Multi-Dendrix and RME). For 0.0005 ≤ q ≤ 0.015, Multi-Dendrix identifies collections of gene sets that were significantly closer (p < 0.01) to the planted pathways P than the collections found by either Iter-Dendrix and IterRME. These results demonstrate that Multi-Dendrix outperforms other methods, even when the passenger mutation probability q is more than 15 times greater than the value estimated from real somatic mutation 7 data. For q ≤ 0.0001, the differences between Multi-Dendrix and Iter-RME were not significant. We also compared the runtimes of each algorithm on the simulated datasets. Multi-Dendrix was several orders of magnitude faster than Iter-Dendrix and Iter-RME on all datasets (Table 2). Note that as the passenger mutation probability q increases, the number of recurrently mutated passenger genes increases. Multi-Dendrix scales much better than Iter-RME and maintains a significant advantage over Iter-Dendrix, completing all simulated datasets in less than 5 seconds. We evaluated how the runtime of Multi-Dendrix scales to larger datasets. Using the same passenger mutation probabilities 0.0001 ≤ q ≤ 0.02 listed above, we calculated the average runtime in seconds of MultiDendrix for ten simulated mutation matrices with m = 100, 200, 400, 800, 1600, 3200, 6400, 12800, 22000 genes and n = 1000 patients, more than the number of patients to be measured in any cancer study from TCGA. In each case, we run Multi-Dendrix only on the subset of genes that are mutated in more than the expected number nq of samples. For the largest dataset with m = 22000 genes, the average number of genes input to Multi-Dendrix for the highest and lowest passenger mutation probabilities are ∼ 9700 and ∼ 2100, respectively. (Table S1 shows the average number of input genes for varyingm and q.) The average runtime for this largest dataset is under one hour (average of 54.4 minutes). Figure S1 shows the runtimes for varying m and q. The Multi-Dendrix Computational Pipeline We incorporate the Multi-Dendrix algorithm into a larger pipeline (Figure 1) that includes several additional preand post-processing tasks including: (1) Building mutation matrices for input into Multi-Dendrix; (2) Summarizing Multi-Dendrix results over multiple values for the parameters t, the number of gene sets, kmin the minimum size of a gene set, and kmax the maximum size of a gene set; (3) Evaluating the statistical significance of results; (4) Examining Multi-Dendrix results for mutually exclusive sets resulting from subtype-specific mutations. We describe these steps briefly below, with further details in the Methods and Supporting Information. First, we build mutation matrices A from somatic mutation data. We use several steps to process singlenucleotide variant (SNV) data, copy number variant (CNV) data, and to combine both types of data. Second, in contrast to simulated data, on real data we do not know the correct values of the parameters t, kmin, and kmax. Thus, we consider a reasonable range of values for these parameters and summarize the results over these parameters into modules. We build a graph, where the nodes are individual genes (or mutation classes) and edges connect genes (respectively mutation classes) that appear in the same gene set for more than one value of the parameters. We weight each edge with the fraction of parameter values for which the pair of genes appear in the same gene set. The resulting edge-weighted graphs provide a measure of the stability of the resulting gene sets over different parameter values. By choosing a minimum edge weight, we partition the graph into connected components, or modules. One may choose to use these modules as the output of Multi-Dendrix. Third, we evaluate the statistical significance of our results using two measures. Since the collection M with high weight W ′(M) may not be surprising in a large mutation matrix A, the first measure evaluates the significance of the scoreW ′(M) maximized by Multi-Dendrix. We evaluate whether the weightW ′(M∗) of the maximum weight collection M∗ output by Multi-Dendrix is significantly large compared to an empirical distribution of the maximum weight sets from randomly permuted mutation data. We generate random mutation data using the permutation test described in [19]. This test permutes the mutations among the genes in each patient, preserving both the number of mutated genes in each patient and the number of patients with a mutation in each gene while perturbing any patterns of exclusivity between mutated genes. Note that this permutation test requires running Multi-Dendrix many times to determine statistical significance for a single parameter setting. Thus, the runtime advantages of Multi-Dendrix compared to Iter-Dendrix and Iter-RME are very important in practice on real datasets. 8 Next, we evaluate whether the collection M∗ output by Multi-Dendrix contains more protein-protein interactions than expected by chance by applying our direct interactions test on a PPI network constructed from the union of the KEGG and iRefIndex PPI networks. The direct interactions test computes a statistic ν of the difference in the number of interactions within and between gene sets in M∗, and compares the observed value of ν to an empirical distribution on 1000 permuted PPI networks (full details of the test are in § Evaluating known interactions). These permuted networks account for the observation that many genes that are frequently mutated in cancer also have large degree in the interaction network – either due to biological reasons or ascertainment bias. We use an interaction network to assess biological function rather than known pathways (e.g. KEGG pathways or GSEA sets) because most of these pathways are relatively large, while the gene sets found by Multi-Dendrix that exhibit exclusivity tend to be much smaller, each containing only a few genes. Finally, we examine possible correlations between the mutually exclusive sets reported by Multi-Dendrix and particular subsets of samples. A number of cancers are divided into subtypes according to pathology, cytogenetics, gene expression, or other features. Since mutations that are specific to particular subtypes will be mutually exclusive, disease heterogeneity is an alternative explanation to pathways for observed mutually exclusive sets. For example, [29] report four subtypes of GBM based on gene expression clusters, and show that several mutations – including IDH1, PDGFRA, EGFR, and NF1 – have strong association with individual subtypes. Unfortunately, if the subtypes are unknown there is no information for Multi-Dendrix, Dendrix, RME, or other algorithms that analyze mutual exclusivity to distinguish between mutual exclusivity resulting from subtypes and mutual exclusivity resulting from pathways or other causes. If subtypes are known, two possible solutions are to analyze subtypes separately, or to examine whether patterns of mutual exclusivity are associated to these subytpes. We annotate results by known subtypes as a post-processing step in Multi-Dendrix. Somatic Mutation data We applied Multi-Dendrix and Iter-Dendrix to four somatic mutation matrices: (1) copy number variants (CNVs), small indels, and non-synonymous single nucleotide variants (SNVs) measured in 601 genes in 84 glioblastoma multiformae (GBM) patients [7]; (2) indels and non-synonymous single nucleotide variants in 623 sequenced genes in 188 Lung Adenocarcinoma patients [27]; (3) CNVs, small indels, and non-synonomous SNVs measured using whole-exome sequencing and copy number arrays in 261 GBM patients [7]; and (4) CNVs, small indels, and non-synonymous SNVs measured in 507 BRCA patients. We will refer to these datasets as GBM(2008), Lung, GBM, and BRCA below. We removed extremely low frequency mutations and known outliers from these datasets as described in Methods. After this processing, the GBM(2008) dataset contained mutation and CNV data for 46 genes in 84 patients; the Lung dataset contained somatic mutation for 190 genes in 163 patients; the GBM dataset contained mutation and CNV data for 398 genes in 261 patients; and the BRCA dataset contained mutation and CNV data for 375 genes in 507 patients. We focus here on presenting results from the latter two datasets because they are the latest whole-genome/exome datasets and most representative of the datasets that are now being produced and will be analyzed now and in the coming years. Results with the first two older and smaller datasets from targeted sequencing are described in the Supporting Information. We compute 2 ≤ t ≤ 4 gene sets, each of minimum size kmin = 3 and maximum size ranging from 3 ≤ kmax ≤ 5. We summarize the results over these 9 different parameter values into modules using the procedure described above.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Review of Driver Genetic Alterations in Thyroid Cancers

Thyroid cancer is a frequent endocrine related malignancy with continuous increasing incidence. There has been moving development in understanding its molecular pathogenesis recently mainly through the explanation of the original role of several key signaling pathways and related molecular distributors. Central to these mechanisms are the genetic and epigenetic alterations in these pathways, su...

متن کامل

Simulated Annealing Based Algorithm for Identifying Mutated Driver Pathways in Cancer

With the development of next-generation DNA sequencing technologies, large-scale cancer genomics projects can be implemented to help researchers to identify driver genes, driver mutations, and driver pathways, which promote cancer proliferation in large numbers of cancer patients. Hence, one of the remaining challenges is to distinguish functional mutations vital for cancer development, and fil...

متن کامل

DGPathinter: a novel model for identifying driver genes via knowledge-driven matrix factorization with prior knowledge from interactome and pathways

Cataloging mutated driver genes that confer a selective growth advantage for tumor cells from sporadic passenger mutations is a critical problem in cancer genomic research. Previous studies have reported that some driver genes are not highly frequently mutated and cannot be tested as statistically significant, which complicates the identification of driver genes. To address this issue, some exi...

متن کامل

Simultaneous Identification of Multiple Driver Pathways in Cancer

Distinguishing the somatic mutations responsible for cancer (driver mutations) from random, passenger mutations is a key challenge in cancer genomics. Driver mutations generally target cellular signaling and regulatory pathways consisting of multiple genes. This heterogeneity complicates the identification of driver mutations by their recurrence across samples, as different combinations of muta...

متن کامل

Computational approaches for the identification of cancer genes and pathways

High-throughput DNA sequencing techniques enable large-scale measurement of somatic mutations in tumors. Cancer genomics research aims at identifying all cancer-related genes and solid interpretation of their contribution to cancer initiation and development. However, this venture is characterized by various challenges, such as the high number of neutral passenger mutations and the complexity o...

متن کامل

Identification of key genes and pathways involved in vitiligo vulgaris by gene network analysis

Background and Aim: Vitiligo vulgaris is an acquired, chronic skin and hair condition characterized clinically by loss of melanin, which, if untreated, is typically progressive and irreversible. The aim of the present study was to identify potential genes involved in the pathogenesis of vitiligo. Methods: One dataset of mRNA expression in patients with vitiligo (GSE65127) were obtained from ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013